Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering
نویسندگان
چکیده
This paper presents a cluster validation based document clustering algorithm, which is capable of identifying an important feature subset and the intrinsic value of model order (cluster number). The important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and the cluster number which maximize the cluster validity criterion are chosen as our answer. We have evaluated our algorithm using several datasets from the 20Newsgroup corpus. Experimental results show that our algorithm can find the important feature subset, estimate the cluster number and achieve higher micro-averaged precision than previous document clustering algorithms which require the value of cluster number to be provided. 2006 Elsevier Ltd. All rights reserved.
منابع مشابه
Image Clustering Using Evolutionary Computation Techniques
In the cluster analysis most of the existing clustering techniques for clustering, accept the numbers of clusters K as an input and determine that many number of cluster for a given data set. The projecting technique will try to discover true number of cluster centers automatically on the run. It will not only determines the true number of the cluster centers but also extracts real cluster cent...
متن کاملOptimal Dual Similarity Noise-free Clusters Using Dynamic Minimum Spanning Tree
Clustering is a process of discovering groups of objects such that the objects of the same group are similar, and objects belonging to different groups are dissimilar. A number of clustering algorithms exist that can solve the problem of clustering, but most of them are very sensitive to their input parameters. Minimum Spanning Tree clustering algorithm is capable of detecting clusters with irr...
متن کاملA New Criterion for Clusters Validation
In this paper a new criterion for clusters validation is proposed. This new cluster validation criterion is used to approximate the goodness of a cluster. The clusters which satisfy a threshold of this measure are selected to participate in clustering ensemble. For combining the chosen clusters, a coassociation based consensus function is applied. Since the Evidence Accumulation Clustering meth...
متن کاملLearning Word Sense With Feature Selection and Order Identification Capabilities
This paper presents an unsupervised word sense learning algorithm, which induces senses of target word by grouping its occurrences into a “natural” number of clusters based on the similarity of their contexts. For removing noisy words in feature set, feature selection is conducted by optimizing a cluster validation criterion subject to some constraint in an unsupervised manner. Gaussian mixture...
متن کاملCUSTOMER CLUSTERING BASED ON FACTORS OF CUSTOMER LIFETIME VALUE WITH DATA MINING TECHNIQUE
Organizations have used Customer Lifetime Value (CLV) as an appropriate pattern to classify their customers. Data mining techniques have enabled organizations to analyze their customers’ behaviors more quantitatively. This research has been carried out to cluster customers based on factors of CLV model including length, recency, frequency, and monetary (LRFM) through data mining. Based on LRFM,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 43 شماره
صفحات -
تاریخ انتشار 2007